NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Phertilizer: Growing a clonal tree from ultra-low coverage single-cell DNA sequencing of tumors

https://doi.org/10.1371/journal.pcbi.1011544

Weber, Leah L.; Zhang, Chuanyi; Ochoa, Idoia; El-Kebir, Mohammed (October 2023, PLOS Computational Biology)
Przytycka, Teresa M. (Ed.)
Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We presentPhertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance ofPhertilizeron simulated data as well as on two real datasets, finding thatPhertilizereffectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.
more » « less
Full Text Available
Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses

https://doi.org/10.1093/molbev/msac133

Zhang, Chuanyi; Sashittal, Palash; Xiang, Michael; Zhang, Yichi; Kazi, Ayesha; El-Kebir, Mohammed (January 2022, Molecular Biology and Evolution)
Leitner, Thomas (Ed.)
Abstract Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the 5’ end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the ﬁrst time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms stateof-the-art gene ﬁnding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identiﬁcation of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the ﬁrst method to perform accurate and simultaneous identiﬁcation of TRS sites and genes in coronavirus genomes without the use of any prior information.
more » « less
Full Text Available
Insights from the Biorepository and Integrative Genomics pediatric resource

https://doi.org/10.1038/s41467-025-59375-0

Buonaiuto, Silvia; Marsico, Franco; Mohammed, Akram; Chinthala, Lokesh K; Amos-Abanyie, Ernestine K; Baras, Aris; Abecasis, Goncalo; Ferrando, Adolfo; Coppola, Giovanni; Deubler, Andrew; et al (December 2025, Nature Communications)

Abstract The Biorepository and Integrative Genomics (BIG) Initiative in Tennessee has developed a pioneering resource to address gaps in genomic research by linking genomic, phenotypic, and environmental data from a diverse Mid-South population, including underrepresented groups. We analyzed 13,152 exomes from BIG and found significant genetic diversity, with 50% of participants inferred to have non-European or several types of admixed ancestry. Ancestry within the BIG cohort is stratified, with distinct geographic and demographic patterns, as African ancestry is more common in urban areas, while European ancestry is more common in suburban regions. We observe ancestry-specific rates of novel genetic variants, which are enriched for functional or clinical relevance. Disease prevalence analysis linked ancestry and environmental factors, showing higher odds ratios for asthma and obesity in minority groups, particularly in the urban area. Finally, we observe discrepancies between self-reported race and genetic ancestry, with related individuals self-identifying in differing racial categories. These findings underscore the limitations of race as a biomedical variable. BIG has proven to be an effective model for community-centered precision medicine. We integrated genomics education, and fostered great trust among the contributing communities. Future goals include cohort expansion, and enhanced genomic analysis, to ensure equitable healthcare outcomes.
more » « less
Free, publicly-accessible full text available December 1, 2026
Jumper enables discontinuous transcript assembly in coronaviruses

https://doi.org/10.1038/s41467-021-26944-y

Sashittal, Palash; Zhang, Chuanyi; Peng, Jian; El-Kebir, Mohammed (November 2021, Nature Communications)

Abstract Genes in SARS-CoV-2 and other viruses in the order ofNidoviralesare expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses ofNidoviralestranscriptomes under varying conditions.
more » « less
Moss enables high sensitivity single-nucleotide variant calling from multiple bulk DNA tumor samples

https://doi.org/10.1038/s41467-021-22466-9

Zhang, Chuanyi; El-Kebir, Mohammed; Ochoa, Idoia (April 2021, Nature Communications)

Abstract Intra-tumor heterogeneity renders the identification of somatic single-nucleotide variants (SNVs) a challenging problem. In particular, low-frequency SNVs are hard to distinguish from sequencing artifacts. While the increasing availability of multi-sample tumor DNA sequencing data holds the potential for more accurate variant calling, there is a lack of high-sensitivity multi-sample SNV callers that utilize these data. Here we report Moss, a method to identify low-frequency SNVs that recur in multiple sequencing samples from the same tumor. Moss provides any existing single-sample SNV caller the ability to support multiple samples with little additional time overhead. We demonstrate that Moss improves recall while maintaining high precision in a simulated dataset. On multi-sample hepatocellular carcinoma, acute myeloid leukemia and colorectal cancer datasets, Moss identifies new low-frequency variants that meet manual review criteria and are consistent with the tumor’s mutational signature profile. In addition, Moss detects the presence of variants in more samples of the same tumor than reported by the single-sample caller. Moss’ improved sensitivity in SNV calling will enable more detailed downstream analyses in cancer genomics.
more » « less

Search for: All records